Generating segment durations in a text-zo-speech system: a hybrid rule-based/neural network approach

نویسندگان

Gerald Corrigan

Noel Massey

Orhan Karaali

چکیده

A combination of a neural network with rule firing information from a rule-based system is used to generate segment durations for a text-to-speech system. The system shows a slight improvement in performance over a neural network system without the rule firing information. Synthesized speech using segment durations was accepted by listeners as having about the same quality as speech generated using segment durations extracted from natural speech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Learning the parameters of quantitative prosody models

The article introduces a novel hybrid data driven and rule based approach for the prosody control in a TTS system, which combines the advantages of well-balanced, quantitative models with the flexible training of derived model parameters. Instancing the training of Fujisaki intonation parameters for German (MFGI) the article describes the hybrid data driven and rule based architecture HYDRA, th...

متن کامل

A hybrid approach to supplier performance evaluation using artificial neural network: a case study in automobile industry

For many years, purchasing and supplier performance evaluation have been discussed in both academic and industrial circles to improve buyer-supplier relationship. In this study, a novel model is presented to evaluate supplier performance according to different purchasing classes. In the proposed method, clustering analysis is applied to develop purchasing portfolio model using available data in...

متن کامل

An RNN-based prosodic information synthesizer for Mandarin text-to-speech

A new RNN-based prosodic information synthesizer for Mandarin Chinese text-to-speech (TTS) is proposed in this paper. Its four-layer recurrent neural network (RNN) generates prosodic information such as syllable pitch contours, syllable energy levels, syllable initial and final durations, as well as intersyllable pause durations. The input layer and first hidden layer operate with a word-synchr...

متن کامل

Modelling the temporal structure of newsreaders' speech on neural networks for Estonian text-to-speech synthesis

Generation of natural-sounding synthetic speech from a text requires perfect control over the temporal structure of speech flow. The present paper describes an attempt to replace the rule-based durational model, hitherto used in Estonian text-tospeech synthesis, by neural networks (NN). For this aim, fluent speech of radio announcers and newsreaders was analysed and its temporal structure was m...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره cs.NE/9811030 شماره

صفحات -

تاریخ انتشار 1997

Generating segment durations in a text-zo-speech system: a hybrid rule-based/neural network approach

نویسندگان

چکیده

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Learning the parameters of quantitative prosody models

A hybrid approach to supplier performance evaluation using artificial neural network: a case study in automobile industry

An RNN-based prosodic information synthesizer for Mandarin text-to-speech

Modelling the temporal structure of newsreaders' speech on neural networks for Estonian text-to-speech synthesis

عنوان ژورنال:

اشتراک گذاری